Supporting Data Integrity
Let's understand the downside of using the EAV database design in detail.
When we use EAV, we sacrifice many advantages that a conventional database design would have given us.
We can’t make mandatory attributes#
Returning to the example stated above, to should also require that the date_reported
attribute has a value. In a conventional database design, it would be simple to enforce a mandatory column by declaring the column NOT NULL
.
In the EAV design, each attribute corresponds to a row in the IssueAttributes
table, not a column. You would need a constraint that checks that a row exists for each issue_id
value, and the row must have the string date_reported
in its attr_name
column.
However, SQL doesn’t support a constraint that can do this. So, you must write the application code to enforce it. If you do find a bug with no reported date, should you add a value for this attribute? What value should you give it? If you make a guess or use some default value for a missing attribute, how does that affect the accuracy of your boss’s reports?
You can’t use SQL data types#
Your boss tells you that they are having trouble running their report because people have entered dates in different formats or sometimes even strings that aren’t dates. In a conventional database, you can prevent this if you declare the column with the DATE
data type.
Try it yourself in the following playground.
In the EAV design, the data type of the IssueAttributes.attr_value
column is typically a string to accommodate all possible attributes in a single column. So, it has no way of rejecting invalid data.
Try it yourself in the following playground.
Some people try to extend the EAV design by defining a separate attr_value
column for each SQL data type, leaving null in the unused columns. Doing so would allow you to use data types but it would make queries even worse:
You would need to add even more columns to support user-defined data types or domains.
Enforced referential integrity#
In a conventional database, we can restrict the range of some attributes by defining a foreign key to a lookup table. For example, the status
attribute of a bug or issue should be one of a small list of values stored in the BugStatus
table.
In the EAV design, we can’t apply this kind of constraint on the attr_value
column. A referential integrity constraint applies to every row in the table.
If we define this constraint, it will force every attribute to match a value in BugStatus
, not just the status
attribute.
Let’s try it in the following playground. We will add the given code above and see how it works.
We will see an error, which occurs as a result of using a column ofBugStatus
as a foreign key. It would need to have the same data available in the column status
of the BugStatus
table to be entered in the attr_value
column.
You can’t make up attribute names#
Your boss’ reports are still not reliable. You find that attributes are not being named consistently. One bug uses an attribute named by the string date_ reported
, but another bug names the attribute by the string report_date
. Both are clearly intended to represent the same information.
How would you count the number of bugs per date?
Run the code in the following playground.
How would you know if a given bug hasn’t stored an attribute by yet another name? How would you know if a given bug has stored a given attribute twice by two different names? How can you prevent such mistakes?
One remedy may be to declare a foreign key on the attr_name
column that links to a lookup table that contains your approved attribute names. However, this doesn’t support attributes you define on the fly for each entity, even though that’s a common practice when using the EAV design.